Skip to content

feat(fabric): support interactive notebook-user auth for Fabric Warehouse + OneLake staging#3872

Open
mattiasthalen wants to merge 16 commits intodlt-hub:develfrom
mattiasthalen:feat/3865-fabric-notebook-user-auth
Open

feat(fabric): support interactive notebook-user auth for Fabric Warehouse + OneLake staging#3872
mattiasthalen wants to merge 16 commits intodlt-hub:develfrom
mattiasthalen:feat/3865-fabric-notebook-user-auth

Conversation

@mattiasthalen
Copy link
Copy Markdown
Contributor

Description

Makes dlt.pipeline(destination="fabric", staging="filesystem") usable from inside a Microsoft Fabric Python notebook under interactive user identity. Adds:

  • access_token and azure_credential fields on FabricCredentials. When either is set, FabricSqlClient.open_connection passes the bearer token to pyodbc.connect via attrs_before={1256: ...} (SQL_COPT_SS_ACCESS_TOKEN) and the DSN omits AUTHENTICATION/UID/PWD. SP path unchanged.
  • New OneLakeNotebookIdentityCredentials class for filesystem staging. Returns adlfs kwargs with account_name/account_host only, letting Fabric's registered OnelakeFileSystem.make_credential() provide notebook-user identity. Only works inside a Fabric notebook kernel.
  • Defensive short-circuit on FabricCopyFileLoadJob._ensure_fabric_token_initialized when the staging SP secret is empty — previously this built a ClientSecretCredential from dummy fields and raised ClientAuthenticationError before any data moved.

Related Issues

Additional Context

  • Tests are mocked — pyodbc, azure-identity, requests stubbed in sys.modules. Pre-submission validation will run the real PR branch inside a live Fabric tenant with a two-run scd2 smoke test; verification output will be attached before flipping out of draft.
  • Draft until all implementation tasks complete.

mattiasthalen and others added 16 commits April 15, 2026 09:22
OneLake (Microsoft Fabric) responds with 403 ClientAuthenticationError
when BlobClient.exists targets a blob name ending in /. That kills
FilesystemClient.initialize_storage at the very first fs.isdir call
on self.dataset_path. Non-OneLake backends silently treat it as False
and hit the same latent defect, just non-fatally. Strip the empty
segment from the pathlib.join so dataset_path never ends in /.

Refs dlt-hub#3866
Black wants the multi-line assert in test_dataset_path_has_no_trailing_separator
reformatted into a single-line assert. Apply the formatter's output so
`make format-check` passes in CI.

Refs dlt-hub#3866
Same OneLake 403 root cause as the previous commit on dataset_path,
one level deeper. FilesystemClient.truncate_tables calls
fs.exists(table_dir) for each entry from get_table_dirs(...), which
on OneLake 403s on every table once dataset_path is already fixed.
Drop the trailing pathlib.sep so get_table_dir returns a path shape
that BlobClient.exists accepts.

Refs dlt-hub#3866
…nt paths

Tasks 2 and 3 of this PR (dlt-hub#3867) stripped the trailing separator from
`FilesystemClient.dataset_path` and `FilesystemClient.get_table_dir`.
The pre-existing `test_trailing_separators` hardcoded the old shape
(trailing /) in its parameterized assertions. Flip those seven
assertions to the corrected shape.

Also drop the stale "ending with separator" phrase from
`get_table_dir`'s docstring — same invariant flip, land together.

`get_table_prefix` is untouched and still preserves its trailing
separator for folder-style layouts; the two assertions on that
method stay as-is.

Refs dlt-hub#3866
Task 2 of this PR (dlt-hub#3867) stripped the trailing separator from
`FilesystemClient.dataset_path`. The pre-existing
`test_destination_config_in_name` assertion at line 218 was
`endswith(dataset_name + pathlib.sep)`, which encoded the old shape.
Replace with `endswith(dataset_name)` and drop the now-unused
`pathlib` local variable (and its `type: ignore` comment).

Caught by `make test-common-p`, not surfaced by the filesystem
test module run in Task 4 because this test lives under
`tests/destinations/`.

Refs dlt-hub#3866
Introduces an optional `access_token` field on `FabricCredentials`
that holds a pre-fetched AAD bearer token, and a `get_access_token()`
helper that returns it as a raw string or `None`. This is the first
piece of notebook-user identity support — a subsequent commit will
add an injectable `TokenCredential` path, and the DSN builder and
`FabricSqlClient.open_connection` will start branching on
`get_access_token()` later in the PR.

Refs dlt-hub#3865
Adds an optional `azure_credential: TokenCredential` field on
`FabricCredentials` and teaches `get_access_token()` to call
`get_token("https://database.windows.net/.default")` on it when the
raw `access_token` is not set. This gives long-running notebook
sessions a refreshing credential path while keeping the one-shot
`access_token` string path for simple cases.

The field uses `Optional[Any]` at runtime because dlt's `configspec`
decorator does not support forward-referenced types; the docstring
documents the expected `TokenCredential` protocol.

Refs dlt-hub#3865
…tive

`get_odbc_dsn_dict` now checks `get_access_token()` and skips
`AUTHENTICATION`/`UID`/`PWD` when a bearer token is available.
SP path is unchanged — the existing regression tests for
`ActiveDirectoryServicePrincipal` and SP credential derivation
remain green.

Refs dlt-hub#3865
`FabricCredentials.on_partial` previously attempted to fall back to
`DefaultAzureCredential` when explicit SP credentials were missing.
That fallback is not supported inside Fabric notebooks. Skip it when
either `access_token` or `azure_credential` is set. SP path unchanged.

Refs dlt-hub#3865
`FabricSqlClient.open_connection` now branches on
`credentials.get_access_token()`. When a bearer token is available, it
packs the token into the little-endian UTF-16 struct ODBC Driver 18
expects for `SQL_COPT_SS_ACCESS_TOKEN` (1256) and passes it via
`pyodbc.connect(..., attrs_before={1256: ...})`. The datetimeoffset
output converter and autocommit-on behavior are preserved. When no
token is available, the call falls through to the parent path.

Six mocked tests cover the struct layout, attrs_before kwarg,
fall-through, autocommit, output converter, and _conn caching.

Refs dlt-hub#3865
New credential class that returns adlfs kwargs with `account_name`
and `account_host` only — no `credential` key. Lets Fabric's
registered `OnelakeFileSystem.__init__` fall through to its built-in
`make_credential()` helper for notebook-user identity. Only usable
inside a Fabric notebook kernel.

Pairs with `FabricCredentials.access_token` on the warehouse side.

Refs dlt-hub#3865
…secret is empty

The Fabric API token warmup builds a `ClientSecretCredential` from
`credentials.azure_client_secret` and hits
`https://api.fabric.microsoft.com/.default` before every OneLake load.
When the SP secret is empty or None, this fails with
`ClientAuthenticationError`. Return early when the secret is falsy.
The real-SP happy path is unchanged.

Refs dlt-hub#3865
Adds a "Notebook user identity" section under the Fabric destination
docs with raw `access_token` (one-shot) and injectable
`TokenCredential` (refreshing) patterns. Includes copy-pasteable
examples using `notebookutils.credentials.getToken("pbi")`.
Cross-links to the filesystem staging OneLake section.

Refs dlt-hub#3865

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ric notebooks

Adds an "OneLake under notebook identity" subsection with TOML and
Python config examples, a caution that the class is Fabric-notebook-only,
and a cross-link back to the Fabric destination notebook identity section.

Refs dlt-hub#3865
…type

Add `# type: ignore[arg-type]` on SimpleNamespace->typed-class calls
in test_fabric_sql_client.py and test_fabric_warmup_gate.py (standard
dlt pattern for test mocks). Add `# type: ignore[no-any-return]` on
the azure_credential.get_token().token return in configuration.py.
Drop void-function return-value captures in warmup gate tests.

Refs dlt-hub#3865

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s active

`on_partial` returned early when `access_token` was set but did not
call `self.resolve()`, leaving the credentials in a partial state.
The pipeline then received `None` for credentials and crashed with
`AttributeError: 'NoneType' object has no attribute 'database'`.

Mirror the existing SP fallback pattern: check `self.host and
self.database` and call `self.resolve()` before returning.

Caught during live Fabric tenant validation.

Refs dlt-hub#3865

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mattiasthalen
Copy link
Copy Markdown
Contributor Author

Live verification against Fabric tenant

Ran the real PR branch inside a Microsoft Fabric Python notebook against a live tenant under interactive notebook-user identity. Two-run scd2 smoke test completed without error.

Branch: feat/3865-fabric-notebook-user-auth @ 65f383c2
Date: 2026-04-16
Environment: Microsoft Fabric Python notebook kernel, ODBC Driver 18, Python 3.12

Config (no monkey patch, no dummy SP fields):

os.environ["DESTINATION__FABRIC__CREDENTIALS__HOST"] = "<warehouse>.datawarehouse.fabric.microsoft.com"
os.environ["DESTINATION__FABRIC__CREDENTIALS__DATABASE"] = "destination"
os.environ["DESTINATION__FABRIC__CREDENTIALS__ACCESS_TOKEN"] = notebookutils.credentials.getToken("pbi")
os.environ["DESTINATION__FILESYSTEM__BUCKET_URL"] = "abfss://<ws-guid>@onelake.dfs.fabric.microsoft.com/<lh-guid>/Files/_dlt_stage"
os.environ["DESTINATION__FILESYSTEM__CREDENTIALS__AZURE_STORAGE_ACCOUNT_NAME"] = "onelake"
os.environ["DESTINATION__FILESYSTEM__CREDENTIALS__AZURE_ACCOUNT_HOST"] = "onelake.blob.fabric.microsoft.com"

Verification query result (Cell 6 — three-row scd2 history):

(1, 'Ada',  datetime.datetime(2026, 4, 16, 13, 21, 13, 169295), None)
(2, 'Bo',   datetime.datetime(2026, 4, 16, 13, 21, 13, 169295), datetime.datetime(2026, 4, 16, 13, 21, 30, 366215))
(2, 'Bo Q', datetime.datetime(2026, 4, 16, 13, 21, 30, 366215), None)

All three rows expected: id=1 unchanged active, id=2 original version closed at run-2 timestamp, new active version from run-2. Verifies:

  • Warehouse token auth via access_token + SQL_COPT_SS_ACCESS_TOKEN (no SP fields needed)
  • OneLake staging under notebook identity (Fabric's OnelakeFileSystem.make_credential() provides auth when no explicit credential kwarg is passed)
  • Defensive warmup short-circuit (no ClientAuthenticationError from dummy SP fields)
  • End-to-end COPY INTO + MERGE for scd2 disposition

Notes:

  • sys.exit(0) needed after %pip install for Fabric notebook kernel to pick up the new package — known Fabric quirk, not a dlt issue.
  • No OneLakeNotebookIdentityCredentials explicit config was needed — the env-var-only path (AZURE_STORAGE_ACCOUNT_NAME=onelake, AZURE_ACCOUNT_HOST=...) also works because dlt's configspec resolves AzureCredentialsWithoutDefaults and the Fabric kernel's registered OnelakeFileSystem handles the credential fallback. The new class is still valuable as explicit documentation of the pattern and for users who prefer programmatic config.

Verification summary

  • make test-common-p: 5562 passed, 60 skipped, 8 env/pre-existing failures (GitHub API rate limit, AI agent test, arrow-table schema contracts flake). Zero regressions from PR 2 changes.
  • make format: clean (1111 files unchanged).
  • make lint: exit 0 (mypy, ruff, flake8, bandit, docstrings, lockfile all clean).
  • Live Fabric tenant: three-row scd2 history confirmed.

Ready for review.

@mattiasthalen mattiasthalen marked this pull request as ready for review April 16, 2026 13:22
@rudolfix rudolfix self-assigned this Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support interactive notebook-user auth for the fabric destination (Fabric Warehouse + OneLake staging)

2 participants